In recent years, driver's visual attention has been actively studied for driving automation technology. However, the number of models is few to perceive an insight understanding of driver's attention in various moments. All attention models process multi-level image representations by a two-stream/multi-stream network, increasing the computational cost due to an increment of model parameters. However, multi-level image representation such as optical flow plays a vital role in tasks involving videos. Therefore, to reduce the computational cost of a two-stream network and use multi-level image representation, this work proposes a single stream driver's visual attention model for a critical situation. The experiment was conducted using a publicly available critical driving dataset named BDD-A. Qualitative results confirm the effectiveness of the proposed model. Moreover, quantitative results highlight that the proposed model outperforms state-of-the-art visual attention models according to CC and SIM. Extensive ablation studies verify the presence of optical flow in the model, the position of optical flow in the spatial network, the convolution layers to process optical flow, and the computational cost compared to a two-stream model.
Wenxin DONG Jianxun ZHANG Shuqiu TAN Xinyue ZHANG
In the pork fat content detection task, traditional physical or chemical methods are strongly destructive, have substantial technical requirements and cannot achieve nondestructive detection without slaughtering. To solve these problems, we propose a novel, convenient and economical method for detecting the fat content of pig B-ultrasound images based on hybrid attention and multiscale fusion learning, which extracts and fuses shallow detail information and deep semantic information at multiple scales. First, a deep learning network is constructed to learn the salient features of fat images through a hybrid attention mechanism. Then, the information describing pork fat is extracted at multiple scales, and the detailed information expressed in the shallow layer and the semantic information expressed in the deep layer are fused later. Finally, a deep convolution network is used to predict the fat content compared with the real label. The experimental results show that the determination coefficient is greater than 0.95 on the 130 groups of pork B-ultrasound image data sets, which is 2.90, 6.10 and 5.13 percentage points higher than that of VGGNet, ResNet and DenseNet, respectively. It indicats that the model could effectively identify the B-ultrasound image of pigs and predict the fat content with high accuracy.
Xianyu WANG Cong LI Heyi LI Rui ZHANG Zhifeng LIANG Hai WANG
Visual object tracking is always a challenging task in computer vision. During the tracking, the shape and appearance of the target may change greatly, and because of the lack of sufficient training samples, most of the online learning tracking algorithms will have performance bottlenecks. In this paper, an improved real-time algorithm based on deep learning features is proposed, which combines multi-feature fusion, multi-scale estimation, adaptive updating of target model and re-detection after target loss. The effectiveness and advantages of the proposed algorithm are proved by a large number of comparative experiments with other excellent algorithms on large benchmark datasets.
Wenrong XIAO Yong CHEN Suqin GUO Kun CHEN
An attention residual network with triple feature as input is proposed to predict the remaining useful life (RUL) of bearings. First, the channel attention and spatial attention are connected in series into the residual connection of the residual neural network to obtain a new attention residual module, so that the newly constructed deep learning network can better pay attention to the weak changes of the bearing state. Secondly, the “triple feature” is used as the input of the attention residual network, so that the deep learning network can better grasp the change trend of bearing running state, and better realize the prediction of the RUL of bearing. Finally, The method is verified by a set of experimental data. The results show the method is simple and effective, has high prediction accuracy, and reduces manual intervention in RUL prediction.
Conghui LI Quanlin ZHONG Baoyin LI
In recent years, the applications of deep learning have facilitated the development of green intelligent transportation system (ITS), and carbon dioxide estimation has been one of important issues in green ITS. Furthermore, the carbon dioxide estimation could be modelled as the fuel consumption estimation. Therefore, a clustering-based neural network is proposed to analyze clusters in accordance with fuel consumption behaviors and obtains the estimated fuel consumption and the estimated carbon dioxide. In experiments, the mean absolute percentage error (MAPE) of the proposed method is only 5.61%, and the performance of the proposed method is higher than other methods.
Automatic modulation recognition (AMR) plays a critical role in modern communication systems. Owing to the recent advancements of deep learning (DL) techniques, the application of DL has been widely studied in AMR, and a large number of DL-AMR algorithms with high recognition rates have been developed. Most DL-AMR algorithm models have high recognition accuracy but have numerous parameters and are huge, complex models, which make them hard to deploy on resource-constrained platforms, such as satellite platforms. Some lightweight and low-complexity DL-AMR algorithm models also struggle to meet the accuracy requirements. Based on this, this paper proposes a lightweight and high-recognition-rate DL-AMR algorithm model called Lightweight Densely Connected Convolutional Network (DenseNet) Long Short-Term Memory network (LDLSTM). The model cascade of DenseNet and LSTM can achieve the same recognition accuracy as other advanced DL-AMR algorithms, but the parameter volume is only 1/12 that of these algorithms. Thus, it is advantageous to deploy LDLSTM in resource-constrained systems.
Daiki NISHIYAMA Kazuto FUKUCHI Youhei AKIMOTO Jun SAKUMA
In real world applications of multiclass classification models, misclassification in an important class (e.g., stop sign) can be significantly more harmful than in other classes (e.g., no parking). Thus, it is crucial to improve the recall of an important class while maintaining overall accuracy. For this problem, we found that improving the separation of important classes relative to other classes in the feature space is effective. Existing methods that give a class-sensitive penalty for cross-entropy loss do not improve the separation. Moreover, the methods designed to improve separations between all classes are unsuitable for our purpose because they do not consider the important classes. To achieve the separation, we propose a loss function that explicitly gives loss for the feature space, called class-sensitive additive angular margin (CAMRI) loss. CAMRI loss is expected to reduce the variance of an important class due to the addition of a penalty to the angle between the important class features and the corresponding weight vectors in the feature space. In addition, concentrating the penalty on only the important class hardly sacrifices separating the other classes. Experiments on CIFAR-10, GTSRB, and AwA2 showed that CAMRI loss could improve the recall of a specific class without sacrificing accuracy. In particular, compared with GTSRB's second-worst class recall when trained with cross-entropy loss, CAMRI loss improved recall by 9%.
We propose GConvLoc, a WiFi fingerprinting-based indoor localization method utilizing graph convolutional networks. Using the graph structure, we can consider the fingerprint data of the reference points and their location labels in addition to the fingerprint data of the test point at inference time. Experimental results show that GConvLoc outperforms baseline methods that do not utilize graphs.
Fuxiang LIU Chen ZANG Lei LI Chunfeng XU Jingmin LUO
Aiming at the different abilities of the defogging algorithms in different fog concentrations, this paper proposes a fog image classification algorithm for a small and imbalanced sample dataset based on a convolution neural network, which can classify the fog images in advance, so as to improve the effect and adaptive ability of image defogging algorithm in fog and haze weather. In order to solve the problems of environmental interference, camera depth of field interference and uneven feature distribution in fog images, the CutBlur-Gauss data augmentation method and focal loss and label smoothing strategies are used to improve the accuracy of classification. It is compared with the machine learning algorithm SVM and classical convolution neural network classification algorithms alexnet, resnet34, resnet50 and resnet101. This algorithm achieves 94.5% classification accuracy on the dataset in this paper, which exceeds other excellent comparison algorithms at present, and achieves the best accuracy. It is proved that the improved algorithm has better classification accuracy.
Yuta FUKUDA Kota YOSHIDA Hisashi HASHIMOTO Kunihiro KURODA Takeshi FUJINO
Deep learning side-channel attacks (DL-SCAs) have been actively studied in recent years. In the DL-SCAs, deep neural networks (DNNs) are trained to predict the internal states of the cryptographic operation from the side-channel information such as power traces. It is important to select suitable DNN output labels expressing an internal states for successful DL-SCAs. We focus on the multi-label method proposed by Zhang et al. for the hardware-implemented advanced encryption standard (AES). They used the power traces supplied from the AES-HD public dataset, and reported to reveal a single key byte on conditions in which the target key was the same as the key used for DNN training (profiling key). In this paper, we discuss an improvement for revealing all the 16 key bytes in practical conditions in which the target key is different from the profiling key. We prepare hardware-implemented AES without SCA countermeasures on ASIC for the experimental environment. First, our experimental results show that the DNN using multi-label does not learn side-channel leakage sufficiently from the power traces acquired with only one key. Second, we report that DNN using multi-label learns the most of side-channel leakage by using three kinds of profiling keys, and all the 16 target key bytes are successfully revealed even if the target key is different from the profiling keys. Finally, we applied the proposed method, DL-SCA using multi-label and three profiling keys against hardware-implemented AES with rotating S-boxes masking (RSM) countermeasures. The experimental result shows that all the 16 key bytes are successfully revealed by using only 2,000 attack traces. We also studied the reasons for the high performance of the proposed method against RSM countermeasures and found that the information from the weak bits is effectively exploited.
Fairuz SAFWAN MAHAD Masakazu IWAMURA Koichi KISE
3D reconstruction methods using neural networks are popular and have been studied extensively. However, the resulting models typically lack detail, reducing the quality of the 3D reconstruction. This is because the network is not designed to capture the fine details of the object. Therefore, in this paper, we propose two networks designed to capture both the coarse and fine details of the object to improve the reconstruction of the detailed parts of the object. To accomplish this, we design two networks. The first network uses a multi-scale architecture with skip connections to associate and merge features from other levels. For the second network, we design a multi-branch deep generative network that separately learns the local features, generic features, and the intermediate features through three different tailored components. In both network architectures, the principle entails allowing the network to learn features at different levels that can reconstruct the fine parts and the overall shape of the reconstructed 3D model. We show that both of our methods outperformed state-of-the-art approaches.
Yoshiyuki TAJIMA Tomoki HAMAGAMI
Ordinal regression is used to classify instances by considering ordinal relation between labels. Existing methods tend to decrease the accuracy when they adhere to the preservation of the ordinal relation. Therefore, we propose a distributional knowledge-based network (DK-net) that considers ordinal relation while maintaining high accuracy. DK-net focuses on image datasets. However, in industrial applications, one can find not only image data but also tabular data. In this study, we propose DK-neural oblivious decision ensemble (NODE), an improved version of DK-net for tabular data. DK-NODE uses NODE for feature extraction. In addition, we propose a method for adjusting the parameter that controls the degree of compliance with the ordinal relation. We experimented with three datasets: WineQuality, Abalone, and Eucalyptus dataset. The experiments showed that the proposed method achieved high accuracy and small MAE on three datasets. Notably, the proposed method had the smallest average MAE on all datasets.
Mobile communication systems are not only the core of the Information and Communication Technology (ICT) infrastructure but also that of our social infrastructure. The 5th generation mobile communication system (5G) has already started and is in use. 5G is expected for various use cases in industry and society. Thus, many companies and research institutes are now trying to improve the performance of 5G, that is, 5G Enhancement and the next generation of mobile communication systems (Beyond 5G (6G)). 6G is expected to meet various highly demanding requirements even compared with 5G, such as extremely high data rate, extremely large coverage, extremely low latency, extremely low energy, extremely high reliability, extreme massive connectivity, and so on. Artificial intelligence (AI) and machine learning (ML), AI/ML, will have more important roles than ever in 6G wireless communications with the above extreme high requirements for a diversity of applications, including new combinations of the requirements for new use cases. We can say that AI/ML will be essential for 6G wireless communications. This paper introduces some ML techniques and applications in 6G wireless communications, mainly focusing on the physical layer.
Human skin visualization in the beauty industry with a smart-phone based on deep learning was discussed. Skin was photographed with a medical camera that could simultaneously capture RGB and UV images of the same area. Smartphone RGB images were converted into versions similar to medical RGB and UV images via a deep learning method called cycle-GAN, which was trained with the medical and the smartphone images. After converting the smartphone image into a version similar to a medical RGB image using cycle-GAN, the processed image was also converted into a pseudo-UV image via a deep learning method called U-NET. Hidden age spots were effectively visualized by this image. RGB and UV images similar to medical images can be captured with a smartphone. Provided the neural network on deep learning is trained, a medical camera is not required.
Xin WANG Xiaolin HOU Lan CHEN Yoshihisa KISHIYAMA Takahiro ASAI
Channel state information (CSI) acquisition at the transmitter side is a major challenge in massive MIMO systems for enabling high-efficiency transmissions. To address this issue, various CSI feedback schemes have been proposed, including limited feedback schemes with codebook-based vector quantization and explicit channel matrix feedback. Owing to the limitations of feedback channel capacity, a common issue in these schemes is the efficient representation of the CSI with a limited number of bits at the receiver side, and its accurate reconstruction based on the feedback bits from the receiver at the transmitter side. Recently, inspired by successful applications in many fields, deep learning (DL) technologies for CSI acquisition have received considerable research interest from both academia and industry. Considering the practical feedback mechanism of 5th generation (5G) New radio (NR) networks, we propose two implementation schemes for artificial intelligence for CSI (AI4CSI), the DL-based receiver and end-to-end design, respectively. The proposed AI4CSI schemes were evaluated in 5G NR networks in terms of spectrum efficiency (SE), feedback overhead, and computational complexity, and compared with legacy schemes. To demonstrate whether these schemes can be used in real-life scenarios, both the modeled-based channel data and practically measured channels were used in our investigations. When DL-based CSI acquisition is applied to the receiver only, which has little air interface impact, it provides approximately 25% SE gain at a moderate feedback overhead level. It is feasible to deploy it in current 5G networks during 5G evolutions. For the end-to-end DL-based CSI enhancements, the evaluations also demonstrated their additional performance gain on SE, which is 6%-26% compared with DL-based receivers and 33%-58% compared with legacy CSI schemes. Considering its large impact on air-interface design, it will be a candidate technology for 6th generation (6G) networks, in which an air interface designed by artificial intelligence can be used.
Weiwei QI Shubin ZHENG Liming LI Zhenglong YANG
Bolts in the bogie box of metro vehicles are fasteners which are significant for bogie box structure. Effective loosening bolts detection in early stage can avoid the bolt loss and accident occurrence. Recently, detection methods based on machine vision are developed for bolt loosening. But traditional image processing and machine learning methods have high missed rate and false rate for bolts detection due to the small size and complex background. To address this problem, a loosening bolts defection method based on deep learning is proposed. The proposed method cascades two stages in a coarse-to-fine manner, including location stage based on the Single Shot Multibox Detector (SSD) and the improved SSD sequentially localizing the bogie box and bolts and a semantic segmentation stage with the U-shaped Network (U-Net) to detect the looseness of the bolts. The accuracy and effectiveness of the proposed method are verified with images captured from the Shanghai Metro Line 9. The results show that the proposed method has a higher accuracy in detecting the bolts loosening, which can guarantee the stable operation of the metro vehicles.
Xiaolin HOU Wenjia LIU Juan LIU Xin WANG Lan CHEN Yoshihisa KISHIYAMA Takahiro ASAI
5G has achieved large-scale commercialization across the world and the global 6G research and development is accelerating. To support more new use cases, 6G mobile communication systems should satisfy extreme performance requirements far beyond 5G. The physical layer key technologies are the basis of the evolution of mobile communication systems of each generation, among which three key technologies, i.e., duplex, waveform and multiple access, are the iconic characteristics of mobile communication systems of each generation. In this paper, we systematically review the development history and trend of the three key technologies and define the Non-Orthogonal Physical Layer (NOPHY) concept for 6G, including Non-Orthogonal Duplex (NOD), Non-Orthogonal Multiple Access (NOMA) and Non-Orthogonal Waveform (NOW). Firstly, we analyze the necessity and feasibility of NOPHY from the perspective of capacity gain and implementation complexity. Then we discuss the recent progress of NOD, NOMA and NOW, and highlight several candidate technologies and their potential performance gain. Finally, combined with the new trend of 6G, we put forward a unified physical layer design based on NOPHY that well balances performance against flexibility, and point out the possible direction for the research and development of 6G physical layer key technologies.
Joanna Kazzandra DUMAGPI Yong-Jin JEONG
Fine-grained image analysis, such as pixel-level approaches, improves threat detection in x-ray security images. In the practical setting, the cost of obtaining complete pixel-level annotations increases significantly, which can be reduced by partially labeling the dataset. However, handling partially labeled datasets can lead to training complicated multi-stage networks. In this paper, we propose a new end-to-end object separation framework that trains a single network on a partially labeled dataset while also alleviating the inherent class imbalance at the data and object proposal level. Empirical results demonstrate significant improvement over existing approaches.
Wenhao HUANG Akira TSUGE Yin CHEN Tadashi OKOSHI Jin NAKAZAWA
Crowdedness of buses is playing an increasingly important role in the disease control of COVID-19. The lack of a practical approach to sensing the crowdedness of buses is a major problem. This paper proposes a bus crowdedness sensing system which exploits deep learning-based object detection to count the numbers of passengers getting on and off a bus and thus estimate the crowdedness of buses in real time. In our prototype system, we combine YOLOv5s object detection model with Kalman Filter object tracking algorithm to implement a sensing algorithm running on a Jetson nano-based vehicular device mounted on a bus. By using the driving recorder video data taken from real bus, we experimentally evaluate the performance of the proposed sensing system to verify that our proposed system system improves counting accuracy and achieves real-time processing at the Jetson Nano platform.
Suraj Prakash PATTAR Tsubasa HIRAKAWA Takayoshi YAMASHITA Tetsuya SAWANOBORI Hironobu FUJIYOSHI
Predicting the grasping point accurately and quickly is crucial for successful robotic manipulation. However, to commercially deploy a robot, such as a dishwasher robot in a commercial kitchen, we also need to consider the constraints of limited usable resources. We present a deep learning method to predict the grasp position when using a single suction gripper for picking up objects. The proposed method is based on a shallow network to enable lower training costs and efficient inference on limited resources. Costs are further reduced by collecting data in a custom-built synthetic environment. For evaluating the proposed method, we developed a system that models a commercial kitchen for a dishwasher robot to manipulate symmetric objects. We tested our method against a model-fitting method and an algorithm-based method in our developed commercial kitchen environment and found that a shallow network trained with only the synthetic data achieves high accuracy. We also demonstrate the practicality of using a shallow network in sequence with an object detector for ease of training, prediction speed, low computation cost, and easier debugging.